NEW BOUNDS FOR SZEMEREDI'S THEOREM, II: A NEW BOUND 

FOR r 4 (iV) 



BEN GREEN AND TERENCE TAO 

Abstract. Define n(N) to be the largest cardinality of a set A C {1, ... , N} which 
does not contain four elements in arithmetic progression. In 1998 Gowers proved that 

r 4 (N) < A(loglogA)- c 

for some absolute constant c > 0. In this paper (part II of a series) we improve this to 

U(N) < AT e -<VloglogJV_ 

In part III of the series we will use a more elaborate argument to improve this to 

r 4 (A) < A(logA)- c . 



To Klaus Roth on his 80th birthday 
1. Introduction 

notational convention. Throughout the paper the letters c, C will denote absolute 
constants which could be specified explicitly if desired. These constants will generally 
satisfy < c C 1 C C. Different instances of the notation, even on the same line, 
will typically denote different constants. Occasionally we will want to fix a constant for 
the duration of an argument; such constants will be subscripted as C , C\ and so on. 
Any implied constants in the O- or <C notations will depend only on any subscripted 
variables. Thus if we say that f(N) = 0$(N) we mean that there is a constant F(5) 
such that f(N) ^ F(a)N for all N. The absence of any subscripted variables should 
be taken to mean that the implied constant is absolute. 

Let A be a large positive integer, and let k ^ 3 be fixed. We define rk(N) to be the 
largest cardinality of a set A C [A] = {1, . . . , N} which does not contain k distinct 
elements in arithmetic progression. 

Klaus Roth proved in 1953 that 

r 3 (N) < A(loglogA)- 1 . 

In particular, r^N) = o(N). Since Szemeredi's 1969 proof that r^N) = o(N) |21j . and 
his later proof [22] that r^N) = Ok{N) for k ^ 5, it has been natural to ask for similarly 
effective bounds for these quantities. A first attempt in this direction was made by Roth 
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in [T^], who provided a new proof that r 4 (iV) = o(N). A major breakthrough was made 
by in 1998 by Gowers |UE], who obtained the bound 

r k {N) < N{\oglogN)- Ck 

for each k ^ 4. 

In the meantime, there has been progress on r 3 (N). Szemeredi (unpublished) obtained 
the bound 

r 3 (N) < jVe- cVioglogiV , (1.1) 

and shortly thereafter Heath-Brown ^3] and Szemeredi [23] independently obtained the 
bound 

r 3 (N) <iV(logiV)- c . (1.2) 
More recently Bourgain |2. found the best bound currently known, namely 

r 3 (N) < iV(loglogiV/ logiV) 1/2 . 

Part I of this series of papers ^2] may be consulted for a more extensive discussion of 
the history of the problem. Our objective in this series is to bring our knowledge of 
r 4 more closely into line with the best known bounds for r 3 . In ^2] this was achieved 
in the so-called finite field model, in which [N] is replaced by a vector space F™ over a 
finite field. In this paper we instead study subsets of [N] itself, and obtain the analogue 
of Szemeredi's unpublished bound (jl.l|) for r 4 (iV). 

Theorem 1.1 (Main theorem). For all large integers N we have 

r 4 {N) < iVe- c ^ loglogJv . 

In part III of the series we will obtain the analogue of the superior bound (|1.2|) . The 
argument will, however, be substantially more technical. 

Let us conclude this introduction by mentioning that the best known lower bound for 
r 4 (AQ is essentially the same as that for r 3 (N), namely Behrend's 1946 bound pQ 

r 4 (iV) ^ r 3 {N) > Ne- ^ 1 ^. 

Somewhat better bounds of shape 

r k (N) > jVe- (log7V)Cfc 

are known for much larger k: see |15| ITTj for details. 

We now briefly outline the proof of Theorem ll.il As with all previous papers obtaining 
quantitative bounds for r k (N), we use the density increment strategy of Roth, a detailed 
discussion of which may be found in 7\. The key is to obtain a dichotomy of the following 
form. 

Proposition 1.2 (Lack of progressions implies density increment). Let N be a large 
integer, let 5 G (0,1), and suppose that A C [N] has \A\ ^ 5N and contains no pro- 
gressions of length 4. Assume that we have the largeness condition N ^ F(5) for some 
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explicit function F. Then there exists an arithmetic progression P C [N] of length at 
least f(N,5) on which we have the density increment 

Here f(N, 5) > is an explicit function which goes to oo as N — > oo for each fixed S, 
and a (6) > is an explicit positive quantity depending only on 5. 



Any proposition of this type will imply, by iteration, a nontrivial upper bound on r^N), 
with the precise bound depending on the functions F(), /( , ), and a( ). For an actual 
calculation of a bound (on r^Fg)) using this strategy, part I of this series may be 
consulted. 



If one desires a good bound it is of particular interest to get f(N,5) and a (5) as 
large as possible. The function F(S) plays a much less significant role and, at least 
for the purposes of a motivating discussion, may be ignored. Gowers' proof that 
r 4 (iV) <C iV(loglog iV)~ c proceeds by establishing Proposition II .21 with f(N,5) N c&c 
and o~(5) ^> 5 C . The main advance in our paper is to improve the density increment 
bound to o~(5) ^> 5. This has the effect of reducing the number of iterations of Propo- 
sition rOl that are required from C5~ c to Clog(l/5). Here is a more precise statement 
of what we shall prove. 

Proposition 1.3 (Lack of progressions implies density increment). Let 5 > 0, and 

suppose that N ^ e cs ° . Let A be a subset of [N] with \ A\ ^ 5N such that A contains 
no progressions of length 4. Then there exists an arithmetic progression P in [N] of 
length \P\ 3> N cS ° such that we have the density increment 



Let us now quickly show how this implies Theorem II. II 

Deduction of Theorem II. II from Proposition Suppose that A C [N] has size 8N, 
and that it does not contain a 4-term progression. We perform an iteration. At the ith 
step of this iteration we will have a set A4 C {1, . . . , Ni} with size 5{N. This set will be 
a linearly rescaled version of a subset of A, and so it too does not contain a progression 
of length 4. Set A := A, N := N and 5o := 5. Now Proposition 11.31 tells us that either 

Ni ^ e c6 ^ C (1.3) 
or else the iteration proceeds and it is possible to choose N i+ i, and A i+ i such that 

N l+1 » Nf 

and 

5 l+1 > (1 + c)5i. 

Now as long as the iteration continues we must have 5i ^ 1, and so after K ^ Clog(l/5) 
iterations the condition (|1.3J) must be satisfied. At this point we have 

iV i ,»iV^ C ) Clos(1/ * ) , 
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and so we derive the inequality 

N (cS c ) c lo s(V«) ^ e cs- c 
After a small amount of rearrangement this leads to the claimed bound 

n{N) < Ne -cviogio g N_ n 

It thus remains to establish Proposition 11.31 Our starting point is our earlier paper 
[TT] , which built upon the original paper of Gowers [3] to provide an inverse U 3 theorem 
which, among other things, already implies Gowers' bound r^N) <C iV(loglog N)~ c . 
This inverse theorem will be stated properly in later sections, but roughly speaking if 
A C [N] had size \A\ ^ 5N and had no progressions of length 4, then A would have 
significant correlation with a certain "local quadratic phase function". An example of 
such a function is n \— > e 2man , though this is not the most general example; the reader 
may wish to consult the surveys [HI EE] for further discussion. 

This correlation implies that A has a significant density increment (comparable to 5 C ) on 
a "quadratic Bohr set" , that is to say an approximate level set of a local quadratic phase 
function. Such a set has size ~ S C N. The next step in jlj is then that of linearisation, 
in which the Bohr set is partitioned into arithmetic progressions. By the pigeonhole 
principle A also has a density increment of ~ 5 C on one of these progressions. It turns 
out that the linearisation can be achieved with progressions of size 3> N which, 
as mentioned earlier, is sufficient to give Gowers' bound. We remark that a similar 
linearisation step already appears in the earlier work of Roth JH] (cf. Lemma 2.3]). 

The main cost in this scheme lies in the linearisation step, which forces one to pass from 
an object of size N to an object of size only N . To improve upon this scheme we 
borrow an idea of Heath-Brown and Szemeredi ^3J |^ from the k = 3 case. Instead 
of finding a quadratic phase function which correlates with A and then linearizing, one 
adopts a more patient stance and first collects several quadratic phase functions. In 
this way a more substantial density increment of c5 can be obtained. Only after this is 
done do we linearise. This procedure of linearizing several quadratic phase functions at 
once turns out not to be as costly as one might think, and in any case it need only be 
done 0(log(l/5)) times due to the size of the density increment. 

In part III of the series we will show that it is possible to be more efficient still, by 
extracting additional gains either from the density increment or from the length of the 
progression on which the increment is obtained. This was carried out in the finite field 
setting in [12] . 

We have mentioned, albeit briefly, the so-called finite field model: the survey jjj may 
be consulted for more information. The advantage of working in F™ as opposed to the 
cyclic group Z/iVZ (which serves as a model for [N]) is the availability of subspaces. In 
Z/iVZ, and in other abelian groups G, one must make do with the notion of Bohr sets, 
which may be thought of as approximate subspaces. There are various technical issues 
involved in dealing with these, as we shall see later on. 
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Remark. It is quite likely that the methods here combine with those in extend to 
general finite abelian groups G; thus if r 4 (G) denotes the largest cardinality \A\ of a 
set A C G without any arithmetic progressions of length 4, a slight elaboration of the 

arguments here should establish r 4 (G) <C |G|e~ c v /log log ' G ' for all large \G\. We will 
however not pursue this matter here. 

2. General notation 

Let A be a finite non-empty set and let / : A —>■ C be a function. It is convenient, so 
as to avoid having to contend with normalising factors, to use the expectation notation 

E A (/)=E, 6A /(x):=^-^/(x). 

More complex expressions such as E xe A,yEB f (% , y) are similarly defined. We also define 
the LP norms 

imb(A) := (® A \f\ P ) 1/P 
for 1 ^ p < oo, with the usual convention H/Hloc^) := sup xgA |/(x)|. We say that / is 
1-bounded if ^ 1- 

If A, B are finite sets with B non-empty, we write ¥b(A) := ^jffi for the density of A 
in B. If A lies in some ambient space X (for example a group) we use 1a '■ X — ► R to 
denote the indicator function of A, that is to say 1a{x) = 1 when i£i and 1a{x) = 
otherwise. We also write for Ia(x). Thus for instance Pb(A) = W,b(1a) for all 
non-empty B C X. 

3. The form A and the U 3 (Z/pZ) norm 

We now begin the proof of Proposition 11.31 It will be convenient to work in a cyclic 
group Z/pZ of large prime order p rather than on the interval [N]. On this cyclic group 
Z/pZ, we introduce the quadrilinear form A(/ 0) fi, A) f3), defined for four functions 
fj : Z/pZ -> C by 

A(/o, fi, f2, h) ■= ^z/ P zh{x)h{x + h)f 2 (x + 2h)h{x + 3h). 

This form is clearly pertinent to the task of counting progressions of length 4, and has 
appeared in many previous papers on this subject. One can quickly deduce Proposition 
II. 3[ and hence Theorem II. 1| from the following claim. 

Theorem 3.1 (Anomalous number of AP4s implies density increment). Letp be a large 
prime, let N be an integer between p/8 andp/4, and let f : Z/pZ —>■ M be a 1-bounded 
non-negative function which vanishes outside of [N]. Set 5 := E[jv](/)- Suppose that 

p>exp(Cr c ) (3.1) 

for some suitably large absolute constant C , and suppose that 

\A(f, f, f, f) - A(61 [N] , 61 [N] , 61 [N] , 61 [N] )\ » 5\ (3.2) 
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Then we can find an arithmetic progression P in [N] obeying the length bound 

\P\->p c&C (3.3) 

and the density increment bound 

E P (f)>(l + c)5 (3.4) 

for some c,C > 0. 

Remark. Strictly speaking, there could be two different notions of an arithmetic pro- 
gression in [N] , one arising from its embedding into the integers Z, and the other arising 
from its embedding into the cyclic group Z/pZ. However, because N < p/4, it is easy to 
see that the two concepts are equivalent; the interval [N] is too short to contain a pro- 
gression that somehow "wraps around" p. (To use some jargon, the two representations 
of [N] are Freiman isomorphic of order 2, which is sufficient to preserve the concept of 
an arithmetic progression; see for instance [2%]-) 

Proof that Proposition MJft implies Theorem I.V.il By increasing 5 if necessary we may 
assume that \A\ — 5N. Choose a prime p between 4N and 8N (this is of course possible 
by Bertrand's Postulate) and take / := 1^, thought of as a function on Z/pZ. Since A 
has no progressions of length 4 we easily see that 

A(/,/,/,/) = 0(l/p), 
whilst the fact that there are |iV 2 (l + o(l)) four-term progressions in [N] implies that 

A (51 [A r], 51 [at], 51 [at], 51 [at]) > 5 4 . 

Since we are taking p to be large, we conclude ()3.2)1 . Applying Theorem 13. 1\ we can 
find a progression P in Z/pZ obeying ()3.3j) and (|3.4jl . and this suffices for our needs. □ 

It remains to prove Theorem 13.11 For the rest of the paper we fix p to be a large prime. 
To be able to exploit the hypothesis (|3.2|) . we will need to show that A is controlled by 
either of two norms (when restricted to 1-bounded functions). The first is the L l norm. 

Lemma 3.2 (L 1 controls A). Let f, g : Z/pZ — > C be uniformly bounded by some a > 0. 
Then we have 

\A{f,f,f,f)-A{g,g,g,g)\ ^ 4a 3 \\f - g\\ LHZ/pZ) . 



Proof. Since A is quadrilinear we have 

A(J, /, f,f)-A{g,g,g,g) 

= A(f-g,fJJ)+HgJ-gJJ)+A(g,g,f-g,f) + Hg,g,gJ-g). 

(3.5) 

The result now follows on applying the triangle inequality and the easily checked bound 

|A(/i,/2,/ 3) /4)| < 11/,-Hi sup \\fiWl, (3.6) 

i=l,...,4 

valid for j — 1, ... ,4. □ 
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The second norm that controls A is the Gowers U 3 -norm \\f\\u 3 (z/pz) of a function 
/ : Z/pZ — ► C, defined as 

H/lll/3(Z/pZ) : = ^x,h u h2,h 3 eZ/pz(f(x)f(x + h 1 )f(x + /l 2 )/(a: + /l 3 )/(z + /li + fc 2 ) X 

xf(x + h 2 + h 3 )f(x + h + h 3 )f(x + h 1 + h 2 + h 3 )). 

This norm was introduced in [HE] and studied further in such papers as [TUJ [TTJ [TJJ |2H] • 

As shown in jS] it is indeed a norm on Z/pZ, but we will not need to know this here. 
In fact we only require two facts about the £7 3 -norm. One of these facts is an inverse 
theorem, which will be the subject of the next section. The other is that the [7 3 -norm 
controls A. 

Lemma 3.3 (U 3 controls A). Let f,g : Z/pZ — > C be 1 -bounded functions on an affine 
space Z/pZ. Then we have 

\Hf, f, f, f) ~ A(<7, g, g,g)\^ 4||/ - g\\ v \ W y 

Proof. We employ the same telescoping identity ()3.5|) that we use to prove Lemma f3. 21 
In place of the fairly trivial bound (|3.fij) we instead apply the Generalized von Neumann 
theorem, which in this setting states that 

f2, f3, fi}\ ^ \\fj\\u 3 (z/ P z) 

for j = 1, ... ,4. This result is proved using three applications of the Cauchy-Schwarz 
inequality: the details are given very explicitly in [9, Proposition 1.11]. □ 

In [H Ej one applied Lemma 13.31 directly to (|3.2|) in order to obtain the lower bound 
||/ — 51[n] \\u 3 (z/ P z) <5 3 - This ultimately led to a density increment of ^> 5 C for / 
on some progression. The resulting iteration scheme thus proceeds for ^> 5~ c steps, 
which is too long for our purposes. Our approach is to develop a so-called Koopman- 
von Neumann structure theorem, which introduces an intermediate approximant K(f\B) 
between / and 51[n\. 

4. The inverse f/ 3 (Z/pZ) theorem 

We now come to the second fact concerning the ?7 3 (Z/pZ)-norm that we shall need, 
namely the inverse U 3 -theorem. This is one of the main results of ^T]. There are 
three (equivalent) formulations of this inverse theorem: one involving locally quadratic 
phase functions, one involving generalized quadratic phases, and one involving 2-step 
nilsequences. Our argument would work with the first two of these but not the third 
(cf. "TTT Theorem 12.7]), which has rather weaker bounds. We use the first formulation 
involving locally quadratic phases. This is in a sense the most basic form of the inverse 
theorem for [7 3 (Z/pZ), since in ^T] the other variants are all derived from it. To describe 
the result we need some notation. 

Definition 4.1 (Bohr sets). Let S C Z/pZ, and let p G (0,1) be a parameter. We 
define the (centred) Bohr set B(S,p) C Z/pZ to be the set 

B(S,p) := {x E Z/pZ : Ux/p\\ m < p}, 
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where ||:e||r/z denotes the distance from x to the nearest integer. More generally, if 
a = (cages' is any element in the IS 1 ) -dimensional torus (M>/Z) s , then we write B a (S,p) 
for the uncentred Bohr set 

B a (S,p) := {x e Z/pZ : \\£x/p - o^Hr/z < p}. 

We refer to \S\ as the rank of the Bohr set, and p as the radius. 

Example. The arithmetic progression [N] is an uncentred Bohr set of rank 1, with 
S = {1}, aii = (N + l)/2p and p = N/2p. More generally, any arithmetic progression 
is an uncentred Bohr set of rank 1, and conversely. The intersection of d arithmetic 
progressions of equal length will be an uncentred Bohr set of rank d. (In fact, in a cyclic 
group of prime order, this essentially describes all the possible uncentred Bohr sets.) 



The inverse ?7 3 -theorem will only require the centred Bohr sets, but we will need the 
uncentred Bohr sets in the next section, when we convert the inverse theorem into a 
Koopman-von Neumann type structure theorem. 

Dealing with Bohr sets is slightly technical. One reason for this is that \B(S,p)\ is not 
guaranteed to depend particularly smoothly on p. As discovered by Bourgain j2] (see 
also jTH Chapter 8]), such a property can be guaranteed for a large supply of p. To 
discuss this issue, the following definition is pertinent. 

Definition 4.2 (Regular Bohr sets). Let S C Z/pZ be a set with size d = \S\, and 
suppose that < p < 1/2. A Bohr set B(S,p) is said to be regular if one has 

(1 - 100d\K\)\B(S,p)\ ^ \B(S, (1 + k)p)\ < (1 + 100d\K\)\B(S,p)\ 

whenever \k\ ^ l/100(i. 



The raison d'etre for this definition is a result of Bourgain [2J (see also [TT) Lemma 8.2]) 
which states that for any S and any e there is at least one regular value of p in the 
interval [e, 2e]. This will not concern us here though it was important for the proofs in 

nn. 

We move swiftly on to some other concepts which are useful in the discussion of the 
inverse theorem for the [7 3 (Z/pZ)-norm. 

Definition 4.3 (Linear phase functions). We say that a function cf) : Z/pZ — ► IR/Z is a 
globally linear phase function if we have 

<f>(x + h + h 2 ) - <p(x + hx) - <j>(x + h 2 ) + <j>(x) = 

for all x, hi,h 2 € Z/pZ. 

Example. Because p is prime, it is easy to see that a function : Z/pZ — > IR/Z is 
globally linear if and only if it takes the form <p(x) = ^x/p + a for some £ e Z/pZ and 
a e IR/Z. 

Definition 4.4 (Quadratic phase functions). Let B C Z/pZ. We say that a function 
(f) : B ^ W/Z is a. locally quadratic phase function on B if we have 

(f)(x + hi + h 2 + h 3 ) - 0(sc + hx + h 2 ) - <p(x + h 2 + ^3) - + hi + h s ) 

+ <f>(x + + 4>(x + h 2 ) + (f)(x + h 3 ) - </>(x) = 



A NEW BOUND FOR r 4 (N) 9 

whenever x, x+hx, x + h 2 , x + h 3 , x + hi + h 2 , x + hi + h 3 , x + h 2 + h 3 , and x + hi + h 2 + h 3 
all lie in B. 

Example. Every globally linear phase function is locally quadratic. If a,/3, 7 are real 
numbers, and N is an integer between p/8 and p/4, then the function <p{n) = an 2 + 
/3n + 7 (mod 1) is a locally quadratic phase function on [N]. 

Remark. There are also notions of locally linear phase functions, and globally quadratic 
ones, but we will not need them here. 

We are now ready to state the inverse theorem for the [7 3 (Z/pZ)-norm in the form that 
we shall need it. We write e(x) : = e 2mx as usual. 

Theorem 4.5 (Inverse £/ 3 (Z//jZ) theorem). Let f : 7Ljp7L — > C be a 1 -bounded function 
such that \\f\\u 3 (z/pZ) ^ V f or some rj e (0,1). Then there exists a regular Bohr set 
B := B(S,p) with \S\ <C and p 3> 77 ; and a locally quadratic phase function 
<py : y + B — > R/Z on y + B for every y G Z/pZ, such that 

E ye z /pZ \E tey+B f{t)e{-<t> y (t))\ > V c . (4.2) 

This is [TT| Theorem 2.7], where in fact the explicit value of C — 2 24 was attained. 

Remark. It is unfortunately necessary to deal with locally quadratic phase functions 
rather than the more intuitively natural globally quadratic phase functions; see jU |H1 
ITT] for further discussion of issues of this type, or the paper for a rather different 
perspective on the same phenomenon. 

5. Linear and quadratic factors, and a quadratic Koopman-von 

Neumann theorem 

As in [T2] , we now use an "energy increment argument" to convert our inverse theorem 
to a quadratic structure theorem of Koopman-von Neumann type, inspired by some 
ideas from ergodic theory. Part I of the series [12] or the lecture notes [0] may be 
consulted for further discussion, and [2Zj gives a more general discussion of structure 
theorems and inverse theorems. We first need some more notation. 

Definition 5.1 (Factors). Let W be any non-empty finite set. Define a factor (or 
a-algebra) in If to be a collection B of subsets of W which are closed under union, 
intersection, and complement, and which contains and W. Define an atom of B to be 
a minimal non-empty subset of W; these partition W, and indeed in this finitary setting 
a factor may be thought of simply as a partition of W. If B, B' are factors in W with 
B C B' we say that B' extends B. More generally, if B, B' are factors in W we let B\/ B' 
be the smallest common extension, so that the atoms of B V B' are the intersections of 
atoms of B and atoms of B' . If B is a factor in W and W is a subset of W, we define the 
restriction B\w> of B to W to be the factor of W formed by intersecting all the sets in 
B with W. If / : W -> C, we let E(f\B) : W -> C denote the conditional expectation 

E(f\B)(x) := E(f\B(x)) for all x e W, 



10 



BEN GREEN AND TERENCE TAO 



where B(x) is the unique atom in B that contains x. Equivalently, E(/|£>) is the orthog- 
onal projection to the space immeasurable functions in the Hilbert space L 2 (Z/pZ). 

We will focus our attention on very structured factors, namely linear and quadratic 
factors, which are generated from globally linear and locally quadratic phase functions 
respectively. The notation here is inspired by the finite field analogues in ^2] but with 
one new parameter, a "resolution" K, which is needed as a substitute for the small 
torsion that one enjoys in the finite field geometry setting. We first need to describe 
how to convert a phase function into a factor. 

Definition 5.2. Call a phase function irrational if it only takes irrational values. If 
cf) : W — > R/Z is an irrational phase function on a finite nonempty set W and K ^ 1 is 
an integer, we define B$,k to be the factor in W whose atoms are the sets {x G Z/pZ : 
\\<f>(x) - j/K\\ m < 1/2K} for j = 0, 1, . . . , K - 1. 

Remark. The assumption of irrationality is a minor technicality, used in order to avoid 
having to deal with the borderline case when ||0(x) — j/K\\^/z is exactly equal to 1/2K; 
in practice we shall be able to use perturbation arguments to work purely with irrational 
phase functions. 

Definition 5.3 (Linear factors). A linear factor of complexity at most d and resolution 
K is any factor B in Z/pZ of the form B = &<$> x% k V ... V B^, dlj K, where ^ d' ^ d and 
<pi, . . ■ , <pd' '■ Z/pZ — > R/Z are irrational globally linear phase functions. 

Remark. From the definitions we see that if B is a linear factor of complexity at most 
d and resolution K, then B has at most K d atoms, each of which is an uncentred Bohr 
set of rank at most d and radius 1/2K. Also, if B' is another linear factor of complexity 
at most d! and resolution K , then clearly B V B' is a linear factor of complexity at most 
d + d' and resolution K. 

Definition 5.4 (Quadratic factors). Let B be an uncentred Bohr set. A pure quadratic 
factor of complexity at most d and resolution K in B is any factor B in B of the form 
B = B^k V ... V B^ , s k, where ^ d' ^ d and 0i, . . . , <pd : B — > R/Z are irrational 
locally quadratic phase functions on B. A quadratic factor of complexity at most (d\, c^) 
and resolution K is any pair (£>i,£>2) of factors in W, where B\ is a linear factor of 
complexity at most d\ and resolution at most K, and £>2 is an extension of B±, whose 
restriction to any atom B of B\ is a pure quadratic factor on B of complexity at most 
d 2 and resolution at most K. We say that one quadratic factor {B' X ,B'^) is a quadratic 
extension of another (£>i, £> 2 ) if B\ C B[ and B 2 Cj B' 2 . 

Remark. Observe that if (£>i,£> 2 ) and (B[,B' 2 ) are quadratic factors of resolution K 
and complexity at most (di, d 2 ) and (d^d^) respectively, then their common extension 
(£>i V B[, B 2 \/ B' 2 ) is a quadratic factor of complexity at most (d\ + d[, d 2 + d' 2 ); this is 
ultimately because the restriction of a locally quadratic phase function to a smaller set 
remains locally quadratic. 

Our next task is to rephrase the inverse theorem, Theorem 14.51 in terms of quadratic 
factors. At heart this is really nothing more than an averaging argument, though due 
to "edge effects" it is somewhat tedious to write down rigorously. 
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Theorem 5.5 (Inverse theorem for U 3 (Z/pZ), again). Let f : Z/pZ — > C be a 1- 
bounded function such that \\f\\u 3 (z/ P z) ^ ?7 /° r some r\ G (0,1). Suppose also that K 
is an integer such that K ^ Cr]~ c for some sufficiently large constant C > 0. Then 
there exists a quadratic factor (81,82) in Z/pZ of complexity at most (0(n~ c ), 1) and 
resolution K such that 

||E(/|B2)||lhz/pZ) »^ C - (5.1) 

Proof. Let S,p be as in Theorem 14.51 Let a = (a^gs be a point on the torus (M./Z) s 
with irrational coefficients (one could chose it randomly, if desired). We then define 8\ 
to be the cr-algebra whose atoms are of the form 

{x G Z/pZ : \\xt/p -at- #/#||r/z < 1/2K for all £ e S} 

where for each (6 5, j% is an integer between and K — 1. One easily verifies that 
8\ is an irrational linear factor of complexity \S\ and resolution K, defined by linear 
phases <j>z(x) := x£/p — £ G S. For each x G Z/pZ, let F(x) be the quantity 

F(x) := sup |E 4eBl(:c) (/(t)e(-0(t)))|, 
0e<i>(x) 

where jBi(rc) is the atom of 8\ that contains x and $(x) is the collection of all locally 
quadratic phase functions <fi : 8\(x) —>■ R/Z. Thus F measures the maximum correlation 
of / with a quadratic phase on the atom B\(x). We claim that it suffices to show that 

\\F\\ L i {W) > V°- (5-2) 

Suppose that this has been established. Written out in full, it becomes the statement 
that 

^■xez/ P z\^teBi(x)f(t)e(-(j)B 1 (x)(t))\ > rf 

for an appropriate choice of (pB 1 (x) G $(x). Modulating each phase by a complex number 
e(Q x ) we may move the modulus signs to the outside, obtaining 

\^x&/p%^Bi{x)f{t)e{-(j)Bi{x){t)) \ > V°- 

Since the two averaging operations are equivalent to the single averaging ]K xe z/pZ this 
becomes 

|E a)6Z / pZ /(a;)e(-0B 1 ( a .)(a;))| > r/ . (5.3) 

By perturbing each of the <pB 1 infinitesimally we may assume that the (f>B 1 are all 
irrational. If we then let 82 be the extension of 8\ whose restriction to each atom B\ of 
8\ is given by B^ B Ki then (B%, 82) is a quadratic factor of complexity at most (\S\, 1) 
and resolution K, and we have 

e(-(f> Bl (x)(x)) = E(e(-(f> Bl(x) )\B2)(x) + 0(1/ K) 

for all x. It is important to note here that 4>Bi{x) depends only on the atom B\(x) and 
not otherwise on x itself. 

It follows from this and ()5.3|) that if K ^ Cr]~ c for sufficiently large C then 

|(/,E(e(0 Bl(a:) )|fi 2 ))| »r/ c , 
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where we have written (gi, g 2 ) := ^x&z/ P z9i(x)g2(x). The conditional expectation op- 
erator g i — > K(g\B 2 ) is self-adjoint with respect to this inner product, and hence this 
implies that 

\W\B 2 ),e(<f> El{x) ))\ » ?7 C 

The desired bound (|5.1|1 is now a consequence of the triangle inequality in the form 
l(0i,02)| < Ibillill^lU- 



It remains, then, to establish (|5.2jl . It is now time to exploit the estimate ()4.2)1 . which 
we urge the reader to recall now. For any fixed y G Z/pZ, we write Q y for the union of 
those atoms of 8\ which only partially intersect y + B (thus they are neither contained 
in y + B(S, p) nor outside of it). We have 

|EWB/(*)e(-^,(f))| < ^^\ E teBj(t)e(-(f) y (t))\ + ¥ y+B (^v) 



By.BiCy+B ^ 1 

^E y+B F + F y+B {Q y ) (5.4) 

(note that F is constant on each atom B{). On the other hand, since B = B(S,p), one 
easily verifies that 

n y Qy+(B(S,p + ±)\B(S,p-±)). 



since B(S,p) is regular, we conclude that 



F y+B (Q y ) « 

Inserting this into (|5.4J) . taking expectations, and using ()4.2j) . we conclude that 

\\F\\ L i (zm + O(-) ^>n c . 

Now d <C r]~ c and p ^> rj c . Thus by taking K ^ CV/ ' for large enough C we obtain 
the claim that \\F\\ L i^/ p z) rf . □ 

Let £>t r iv denote the rather trivial factor generated by the two atoms [N] and (Z/pZ)\[iV]. 
With / as in Theorem Kill and this new notation we have 51[jv] = E(/|S tr i v ). Our next 
task is to iterate Theorem 15 .51 via an energy increment argument to obtain the following 
structural result of "Koopman-von Neumann" type. The blueprint for arguments of 
this type is Szemeredi's proof of his regularity lemma in graph theory [22]. For other 
examples in additive combinatorics the reader might consult any of [71 I§1 \W\ 1X2"! l2l?l |2"%] . 

Theorem 5.6 (Quadratic Koopman-von Neumann theorem). Let f : Z/pZ — > [—1, 1] 
be a 1 -bounded function, and let r] > 0. Suppose also that K is an integer such that 
K ^ Cn~~ c for some sufficiently large constant C > 0. Then there exists a quadratic 
factor (81,82) in Z/pZ of complexity at most (0(i]~ c ) , 0(n~ c )) and resolution K such 
that 

||/-E(/|5 2 VBtri V )|k3( Z/pZ ) (5.5) 
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• Step 0: Initialize B\ = B 2 = {0,Z/pZ}. Thus (£>i,£> 2 ) is a quadratic factor 
with complexity (0, 0) and resolution K. 

• Step 1: If (|5.5jl holds, then stop. Otherwise, apply Theorem 15.51 with / re- 
placed 1 by / — E(/|i?2 Vi3 t riv) t° obtain a quadratic factor (B[, B' 2 ) of complexity 
at most (0(r]~ c ), 1) and resolution K such that 

||E(/ - E(f\B 2 V B tliv )\B' 2 )\\ L i {z/pZ) » vp. (5.6) 

• Step 2: Replace (£>i,£> 2 ) with (B[,Bi V B' 2 ) (thus increasing the complexity of 
(Bi,B 2 ) by at most (0(r}~ c ), 1)), and return to Step 1. 

Observe from (|5.6|) and Cauchy-Schwarz that 

||E(/ - E(f\B 2 V B triv )|^)|| i2(z/pZ) » v c , 

and hence 

||E(/ - E(f\B 2 V £ triv )|£ 2 V B' 2 V B triv )|U 2(z/pZ) > rf. 
By Pythagoras' theorem we conclude that 

||E(/|B 2 V B' 2 V £ triv )||i W) - ||E(/|B 2 V S triv )||i 2(z/pZ) » V c . 

It follows that every time we perform Step 2, the energy ||E(/|i3 2 V $triv)||i,2( Z /p Z ) in- 
crements by at least ^> r/ c . Since the energy is clearly bounded between and 1, the 
algorithm can only run for at most 0(r]~ c ) iterations, and the claim easily follows. □ 

If we apply this theorem (with rj := c5 4 for some small c) and Lemma EPl to the situation 
in Theorem lH.il we obtain the following corollary. 

Corollary 5.7 (Anomalous AP4 count on a quadratic factor). Let the assumptions be as 
in Theorem VJ . 1\ Then there exists a quadratic factor (Bi,B 2 ) in Z/pZ of complexity at 
most (0(5~ c ),0(5~ c )) and resolution 0(5~ c ) such that the function g := E(/|£> 2 V£> triv ) 
obeys 

\&(j9, 9,9,9) ~ A (51 [Af] , <51 [A T], 51 [A T], <51 [A t]) I > 5 4 . (5.7) 

Thus we have replaced the original function / by the more structured function g. Note 
that 51[n] = E(g\B tr iv)- From this it is not hard to obtain, under the assumption that 
/ has an anomalous count of 4-term progressions, a substantial density increment for / 
on a quadratic Bohr set. 

Corollary 5.8 (Density increment on quadratic Bohr set). Let the assumptions be as 
in Theorem VJ. 1\ Then there exists a quadratic factor (Bi,B 2 ) in Z/pZ of complexity at 
most (0(6~ c ), 0(5~ c )) and resolution 0(5~ c ), and an atom B 2 of B 2 V B tT i v of density 
Pz/pz(-B 2 ) ^> exp(— 0(5~ c ')) and contained in [N] such that 

E ft (/)>(l + c)5 

for some absolute contant c > 0. 



1 This function is bounded pointwise by 2. It is clear that a trivial rescaling of Theorem 15 . 51 applies 
to such functions. 



14 



BEN GREEN AND TERENCE TAO 



Proof. Let (B^B^) and 9 be as in Corollary 15.71 The facts that [N] is measurable in 
£>2 V £> triv and that / is supported on [N] guarantee that g is also supported on [N] . Let 
Q denote the set where g ^ (1 + c)5, where c > is a small constant to be chosen later, 
and let g' := (1 — ln)g. From Lemma f3. 21 we have 

| A(g, g, g, g) - A(g', g' , g', g') \ *C 4Pz/ p z(^) 

and 

\Mg' , g' , g' , g') - A(5i [N] ,5i [N] ,5i [N] ,5i [N] )\ ^ 85 3 \\g' - 5i [N] \\ L i {z/pId) . 

Furthermore we evidently have 

\\g — 51[n]\\li(z/pz) ^ \\g — 81[n]\\li(z/ p z) + Pz/ P z(^)- 

Combining these three estimates together with (|5.7|) we obtain 

S 3 \\g - 51 [N] \\ L i {z/pZ ) + F z/pId (tt) > 5 A . 

Now observe that the positive part (g — 51[n])+ of g — 51nv] can only exceed c5 on Q, 
and hence has a total L l norm of at most c5 + Pz/ P z(^)- Since (7 — 51n\n also has mean 
zero, we conclude that 

\\g - ^1[at]||li(z/ p z) = 2||(p - 51[at]) + ||l 1 (z/ p z) < c5 + Pz/ P z(^)- 
If c is chosen small enough, we deduce that 

Pz/ P z(^) > 5\ 

Now i5 2 V £>triv has complexity and resolution 0(5~ c ), and hence contains at most 
exp(0(<5 -c )) atoms. By the pigeonhole principle we can therefore find an atom B 2 
of B 2 V B triv which is contained in Q and which has fz/ P z(B 2 ) ^> exp(— 0(5~ c )). By 
construction we have E# 2 (/) ^ (1 + c)5, and the claim follows. □ 

Our sole remaining task is to take this density increment for / on a "quadratic Bohr 
set" and use it to obtain a similar density increment for / on an arithmetic progres- 
sion (Theorem IH.ljl . This we do by splitting the quadratic Bohr set into a union of 
progressions, a process we call linearisation. 



6. Linearisation of quadratic Bohr sets 

We will decompose a quadratic Bohr set into a union of progressions. Our method 
for doing this does not naturally output progressions of equal sizes, and the following 
simple variant of the pigeonhole principle is designed to ensure that there is at least one 
progression which is quite long and on which / has a substantial density increment. 

Lemma 6.1 (Pigeonhole principle). Let B be a non-empty set, and let B = AiU. . .UA m 
be a partition of B into m disjoint sets. Let f : B —>■ M + be a 1-bounded nonnegative 
function. Then for any e > 0, there exists i G {!,..., m} such that P^Aj) > e/m and 



E Ai (f)>E B (f)-e. 
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Proof. Let Q be the union of all the for which P B (Aj) ^ s/m. We obviously have 
Pfi(fi) ^ e. From Bayes' identity and the fact that ^ / ^ 1 we have 

E B (/) = P B (fi)E n (/) + (1 - ¥ B (n))E B \ n (f) <: P B (ft) + E B \n(/), 

and it follows that 

E B \n(/)^E B (/)-e. 

Partitioning into its constituent sets A iy the claim follows from the usual pigeonhole 
principle. □ 



The next result provides the splitting of a quadratic Bohr sets into progressions, and is 
the main result of the section. 

Proposition 6.2 (Linearisation of quadratic Bohr sets). Let (£>i,£>2) in Z/pZ of com- 
plexity at most (di,d-i) and some resolution K, and let B2 be an atom of B<i- Then 
one can partition B2 D [N] as the union o/<C ^'^'_/V" 1 - c /( rf i+ 1 )(<fc+ 1 ) 3 disjoint arithmetic 
progressions in Z/pZ. 



Proof of Theorem \3.1\ assuming Proposition ^ . 6 A Suppose that / satisfies the conditions 
of Theorem 13.11 By Corollary 15.81 we know that there is a quadratic factor (i3i,i?2) 
in Z/pZ of complexity at most (0(5~ c ), 0(5~ c )) and resolution 0(5~ c ), and an atom 
B2 C [N] of B2 V Btviv, having density at least exp(— 0(5~ c )) in Z/pZ, on which the 
average of / is at least (1 + cq)5. Using Proposition 16.21 we may write E>2 as the 
union of exp(0(5~ c ))N 1 ~ cS progressions. Taking e := Cq8/2 in Lemma f6. 11 we obtain 
a progression of length at least exp(— 0(5~ c ))N c6C on which / has average at least 
(1 + \cq)5. To complete the proof of the theorem, we need to make sure that this length 

is in fact 3> iV c for absolute constants c',C > 0. This may be ensured by taking the 
absolute constant in the condition (|3.1|) to be sufficiently large. □ 

It remains to prove Proposition 16.21 We first deal with the linear component of the 
factor (£>i,£> 2 ). Since £> 2 extends £>i, there is a unique atom B 1 in B\ which contains 
B 2 . 

Proposition 6.3 (Linearisation of linear Bohr sets). Let B\ be a linear factor of com- 
plexity di and resolution K . Let B\ be an atom in B\. Then one can partition B\ n [N] 
as the union o/<C 2 rfl A^ 1 ~ 1 ^ cfl+1 ^ arithmetic progressions. 



Proof. We can write B\ as an uncentred Bohr set B a (S, 1/2K), where \S\ ^ d\ and 
a G (1R/Z) S . Using the Kronecker approximation theorem (Proposition IA. lj) we can 
find a non-zero r G Z/pZ such that 

Ur/p\\ m « iV-V^+D 

for all £ G SU-fl}. If we then partition Z/pZ into 0(jV 1 ~ 1 / ( - dl+1 )) arithmetic progressions 
of common difference r and length 0(N 1 ^ dl+1 ^), we see that the intersection of each of 
these progressions with Bi (1 [N] will be the union of no more than 2 dl smaller arithmetic 
progressions, also of step r, and the claim follows. □ 
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This last proposition improves our situation considerably, since it is much easier to 
understand quadratic phases on a progression than it is quadratic phases on a Bohr set. 

Proposition 6.4 (Linearisation of pure quadratic Bohr sets). Let P be an arithmetic 
progression in Z/pZ, and let fa, . . . ,fa : P R/Z be locally quadratic irrational phase 
functions on P. Consider the factor B<j, u K V ... V B^k of resolution K defined by these 
phase functions (cf. Definition ^. ^ . Then for any resolution K, every atom B 2 C P of 
can be partitioned as the union o/<C £Z°W|P| 1 ~ c /( 1+d ) 3 disjoint arithmetic progressions. 

Let us now see why Proposition Id HI and Proposition 16. 41 together imply Proposition E21 
First let us take all the progressions arising from Proposition If) . HI which are rather small, 
say having length 0{N 1 ' 2 ^ dx ^ rX >). We can partition these progressions in the most trivial 
way into singletons, ending up with at most 0(N l ~ 1 ' 2 ( dx ~* rX >) single-element progressions 
in this way. As for each longer progression P in B\ H [N], of length 3> N l ^ 2 ^ dl+1 \ 
we apply Proposition 16.41 to see that P H B 2 is the union of <C a 1 / ^ d2 ^p^ 1 - c /(. 1 + d 2) 3 <^ 
^o(d 2 ) j^--c/(i+d 1 )(i+d 2 f |p| disjoin arithmetic progressions. Assembling all of these pro- 
gressions together as P varies, we obtain 0(d 2 N 1 ~ c ^ 1+dl ^ 1+d2 ' >B ) disjoint progressions 
in total, and Proposition 16.21 follows. 

It remains to prove Proposition 16.41 A result of this type, in which there is just a single 
quadratic phase, may be found in jlj. Here, however, we are dealing with d quadratics 
rather than just one, and will have to take a little care to make sure that our exponents 
depend only polynomially on d rather than exponentially. Because of this, we cannot, 
for example, simply iterate the analogous single-quadratic results from Jlj. As a first 
step we may apply an affine linear transformation to P and assume that P = [1, M] for 
some M, 1 ^ M ^ p. We can also take d ^ 1 since the d = case is trivial. It is easy 
to see, straight from the definition of a quadratic phase, that each <f>j : [1,M] — > R/Z 
takes the form 

fa(n) = ctjU 2 + Pjn + 7j 
for some ctj,[3j,jj & R/Z. The set B 2 thus takes the form 

{n G [1, M\ : \\ aj n 2 + fan + t$|| k/2 < 1/2K for j = 1, . . . , d} 

where jj is some other element of R/Z. Our objective is to partition this set into 
■C d°^ M l ~ c l di disjoint arithmetic progressions. 

The first step, as in [1], is to find a scale r for which the effects of the quadratic 
components otjn 2 of each phase are locally negligible. Applying Proposition IA.2l we can 
locate an integer r, 1 ^ r ^ yM, such that 

||« i r 2 || M/z «rfM- c »/ d2 (6.1) 

whenever 1 ^ j ^ d, where Co > is an absolute constant. Now we can partition [1, M] 
into at most M 1-c °/ 4d2 arithmetic progressions of step r and lengths ~ M C0 ^ 4d2 (that 
is, bounded above and below by absolute constants times this). It will suffice to show 
that, for each such arithmetic progression P, the set P H B 2 can be partitioned into 
< d°( d )\P\ 1 - 1 / 2d arithmetic progressions. 
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Let us fix one of these progressions P = a, a + r, . . . , a + (k — l)r, where k ~ M C0//4c(2 . 
From (J6.1)) we have 

IK'^IIr/z < dk~ 4 for j = 1, . . . , d (6.2) 

The set P n P2 can be written as 

{a + ir : \\i 2 ajr 2 + /J^pi + 7j,p||r/z < 1/2K for j = 1, . . . , <i} 

where /^p, t^p are some real numbers depending on j and P. Now we use Kronecker's 
theorem (Proposition IA. lj) to find a positive integer s ^ v^fc such that 

\\P 3 ,ps\\ m ^ k-^ 2d (6.3) 

for all j G {1, • • • ,<i}. We now partition P into <C fc 1-1 / 2d arithmetic progressions of 
step rs and length <C Consider a single such progression Q. It can be written as 

Q = { a +(b + ts)r :l^t^T} 

for some b ^ k and some T -C k 1 ^ 2d , and its intersection with B 2 can be written as 

{a+(b+ts)r :l<:t^T; \\(b+ts) 2 a j r 2 +f3 jt p(b+ts)+-? jtP \\ m < 1/2K for j = 

The expression (6 + ts) 2 a,jr 2 + (3j,p(b + is) + 7^ can be rewritten (modulo 1) as 

t 2 s 2 {ajr 2 } + t(2bs{ajr 2 } + {/3 i>P s}) + c j>PjQ 

where {x} G (—1/2, 1/2] is the difference between x and the nearest integer to x, and 
c^pq is some real number. Observe from ()6.2|) . ()6.3|) and the bounds s ^ \/&, b ^ k, 
T <C k l l 2d that the coefficients of t 2 and t in this quadratic polynomial are 0(d/T 2 ) and 
0(d/T) respectively. Thus, for each fixed j, the set of values t for which this expression 
has an M/Z norm less than 1/2K is the union of 0(d) intervals (arithmetic progressions 
of step 1). This means that Q n B 2 is the union of at most 0(d) d <C d°^ intervals, 
and thus P PI B 2 can be partitioned into <C d°^k 1 ~ 1 ^ 2d progressions as desired. This 
concludes the proof of Proposition 16 . 41 and hence, by earlier reductions, that of our main 
theorem. □ 



Appendix A. Simultaneous quadratic recurrence 



We recall the well-known Kronecker approximation theorem: 

Proposition A.l (Kronecker approximation theorem). Let a\,...,a^ be real numbers, 
and let N ^ 1 be an integer. Then there exists an integer n, 1 ^ n ^ N , such that 

\\™*3\\w < N ~ 1,d forj = l,..., d. (A.l) 



This is easily deduced from the pigeonhole principle, partitioning the torus (M/Z) d 
into fewer than N regions of diameter 0(N l l d ) each, and considering the orbit of 
(nai, . . . , nctd). The objective of this appendix is to prove the following quadratic 
analogue of the above theorem, due to Schmidt [20] . 
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Proposition A. 2 (Simultaneous quadratic recurrence). Let oti, . . . , a d be real numbers, 
and let N ^ 1 be an integer. Then there exists an integer 1 ^ n ^ iV suc/i that 

Hn^-llR/z < c/iV^ 2 /or j = 1, . . . , d. (A.2) 
.Here c > is an absolute constant. 

In actual fact Schmidt shows that one may satisfy 

\\n 2 a 3 \\ m ^ d , £ ArV(^) +£ . 

The exponent here is of course more precise than the one we quote, but it is of critical 
importance for our work that we have some understanding of the dependence on d of the 
implied constant in the <Cd, e . We could allow it to be (say) e°( dC \ but not much worse. 
Schmidt's argument is explicit and effective enough that such bounds can probably 
be extracted with some effort from [20]; but for the convenience of the reader we shall 
instead provide a complete and self-contained proof of Proposition IA. 21 in this appendix. 
Note that we only require an exponent of shape N~ c ^ d in (jA.2|) . which is somewhat 
weaker than what [20 j gives, but we do not know of a way to obtain such an exponent 
which does not follow Schmidt's argument. An exponent N~ l l c may be obtained by 
the simpler device of iteratively applying the case d = 1 of Proposition IA.2I (see jH] for 
details), but this does not suffice for our purposes here. 

Let us begin by sketching some features of Schmidt's argument. Suppose one wishes 
to find an n ^ N such that ||na!j||]R/z ^ e for j = l,...,d, With Weyl's well-known 
equidistribution argument in mind, it is natural to take a smooth function x which 
approximates the characteristic function of the cube [— e, e] d G (1R/Z) d and then evaluate 

^2 X(n 2 a u . . . } n 2 a d ) 

by expanding \ as a Fourier series on Z d . Using Weyl's inequality for quadratic phases, 
which we will discuss shortly, such a procedure provides a good (and, in particular, 
positive) estimate provided that there are no "diophantine" relations amongst the otj. 
Problems are encountered when, for example, 

Wnax H h r d a d \\ R/z 

is small for smallish integers ?v However it turns out that if there is such a relation then 
it may be used to essentially reduce the dimension of the problem by one, so that one may 
proceed inductively. In order to make the induction efficient one cannot work simply 
with cubes [— e, e] d . Instead one must work with a larger class of domains, such as arbi- 
trary symmetric convex bodies K. Using some arguments in the geometry of numbers 
or in finite-dimensional Banach space theory one may approximate K by an ellipsoid K. 
Thus one is interested in whether there is n ^ N such that (n 2 «i, . . . , n 2 a d ) + Z d . 
By a linear transformation one may map K to the unit ball -8(0, 1), and the problem 
then becomes one of determining whether (n 2 a[, . . . , n 2 a' d ) G -8(0, 1) + A, for a lattice 
A G M. d . Schmidt's result says that this is so if is suitably large depending on det(A) 
and, as we remarked, it is essentially proved by induction on the dimension of A. 
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Our approach will be more-or-less the same. However we make the observation that 
a rather natural smooth approximation to the characteristic function of -6(0, 1) + A is 
provided by the theta function associated to A. This is particularly so if one wishes to 
do harmonic analysis, as the Poisson summation formula takes a very pleasant form. 

Definition A.3 (Theta functions). Suppose that A is a lattice of full rank in M. d . For 
any t > and x = (xi, . . . , x<j) G M d , we define the theta function 

e A (t,x) := J2e~ ntlx - m]2 

meA 

where \x\ :— {x\ + . . . + x 2 d ) x l 2 is the usual Euclidean norm. 

Remark. For most of this appendix, one should think of A (t, clS db blurred version 
of the characteristic function of the set obtained by placing a Euclidean ball of radius 
~ about every point of A. 

From the Poisson summation formula we have the fundamental identity 

y e -M*-m\* = 1 y e -*Kl7* e (£ ■ x) (A.3) 
^ t d / 2 det(A) ^ K ' v ' 

meA y ' £eA* 

where A* := {£ G M. d : £ ■ m G Z for all m G A} is the dual lattice of A. 

The determinant det(A) is, of course, an important quantity associated with the lattice 
A. In our argument, however, a somewhat different quantity will play a more prominent 
role. 

Definition A.4 (Definition of A A ). Let A be a lattice of full rank in R. d . Define 

A A := 9 AV (1, 0) = e '^ 2 = det ( A ) E e ~ Am? - ( A - 4 ) 
§eA* meA 

Remark. The last equality follows from (|A.3j) . 1/A\ may be thought of as a kind of 
measure of how likely it is that a random point in M. d lies within 0(1) of A. 

Now let a = (cci, . . . , ad) G W 1 and let > 0. We define the quantity 

F A , a {N) := det(A)E_^ ns;Ar 0A(l,n 2 «) 

From ()A.3|) we have 

F A , a ( N ) = E e~ w|e|a E-JV<r^e(n 2 £ • a). (A.5) 
?eA* 

We will work towards a lower bound for F Aa (N). The precise statement of this bound 
may be found in Proposition IA.9I below. Once this is available, a straightforward trun- 
cation argument can be used to show that n 2 a is often within 0(1) of the lattice A. 
Rescaling suitably, one may insist that n 2 a is within e of A under appropriate condi- 
tions, and Proposition IA.2l follows. We postpone the details until the end of the section, 
focussing for now on the much more interesting issue of a lower bound for F Aa (N). 

Later on we will need the following list of simple but slightly technical properties of 
F A a . The reader may care to skip the next lemma on a first reading. 
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Lemma A. 5 (Properties of F Aa ). Let A be a lattice of full rank in R d , let a G M. d , and 
let N > 0. 

(i) (Contraction of N) For any c G (^, 1), we have F Ajd (a, N) 3> cF Ajd (a, cN). 

(ii) (Dilation of a) For any integer q ^ 1, we have F Atd (a,N) ^> -F A ^ d (q 2 a, N/q) . 

(iii) (Stability) If a E W 1 is such that \a — a\ ^ eN~ 2 for some e G (0,1) ; then 
F A4 (a,N) > F {1+£) . Aid ((l + e)a,N). 

Proof. The bound (i) follows immediately from the definition of -Fa,^, the positivity of 
©. The bound (ii) also follows immediately from the definition of F A)CX) restricting the n 
variable to multiples of q. We now turn to the stability estimate (iii). If |a — a\ ^ eN~ 2 
then 

n 2 a — m\ — \n 2 a — m\ | ^ e 

for all n, —N ^ n ^ N, and all m G A. Write, temporarily, X := \n 2 a — m\ and 
X := \n 2 a — m\. If X ^ 2 then we have the inequality 

tt(1+£) 2 (X -e) 2 ^ ttX 2 , 

and so 

If X < 2 then e~ nX2 > c, and so 

e^ x2 > ce-( 1+£ ) 2 ^ 2 
in this case. Thus in both cases we have 

e^ x2 »e-( 1+£ ) 2 ^ 2 . 

Substituting for X, X, summing this in m and averaging in n, the claim follows. □ 

The next lemma is the key ingredient in our argument. It formalises the idea that 
everything is relatively straightforward unless there is a "diophantine" relation amongst 
the ctj. 

Lemma A. 6 (Schmidt's alternative). Suppose that a G R d and that A C ]R d is a full- 
rank lattice. Let N > be an integer. One of the following two alternatives always 
holds: 

(i) F Ka {N) > 1/2; 

(ii) There is a positive integer q <C dA A and some primitive £ G A* \ {0} such that 

|e|«v / rf+ v / logAl (A.6) 

and 

\\qi-a\\ w/z <^A c A N- 2 . (A.7) 
Remark. We say that £ G A* is primitive if £/n ^ A* for any integer n ^ 2. 
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Proof. Suppose that (i) fails to hold. Then from (jA.5|) and the triangle inequality we 
have, using (jA.3|) . that 

e-^\ 2 \E_ N ^ N e(n 2 £ ■ a) | > 1/2. (A.8) 

£eA*\{0} 

Our first task is to truncate this. To this end let M ^ 1 be a cutoff parameter to be 
chosen later. We have 

\*-N*XNe{n a t • a) | < £ 
£eA*:|£|^M £eA*:|£|^M 

^ e -^A/ 2 /2 ^ e -^|«| 2 /2 
5GA* 

= e -.MV2 2 d/2 det(A) ^ e -2. H 2 
rraSA 

Choosing M := C(yd + ^/\og A\) for suitable C we may clearly make this less than 
1/4, and hence from (|A.8j) we have 

^ e-*W\E_ N ^ <N e(n 2 Z ■ a)\ > 1/4. 

feA*:0<|£|<Af 

From the definition of A a this implies that there is £ G A* \ 0, |£| ^ M, such that 

\E_ N ^ N e(n 2 £ -a)\> l/AA A . (A.9) 

This puts us in the situation covered by Weyl's inequality, a discussion of which may 
be found in ^SJ Chapter 3] or [2HJ Chapter 2]. The following formulation of the result 
follows easily from the standard one as given in those two references; see also [T3*l Lemma 
A.13]. 

Weyl's Inequality. Let 9 G R, let 5 G (0, 1) and suppose that N > is an integer 
such that |E_jv<„<jve(ri 2 #)| ^ 5. Then there exists a positive integer q <C d~~ Cl such that 
\\q9\\ m <^5-^N- 2 . 

Remark. For us the exact values of C\,C2 are unimportant, but it is possible to take 
C\ — 2 and C2 to be any number larger than 2. 



The bounds (|A.6J) and (|A.7J) follow immediately from this and (|A.9J) . It remains to show 
that £ can be chosen to be primitive. There is certainly a natural number n such that 
£/n lies in A* and is primitive. Setting £ := £/n and g := nq it is clear that the bounds 
(jA.6|) and (jA.7|) are preserved. We must show that q <C cM A for some absolute C . To 
do this we note from (jA.4|) that if A* G A* \ {0} is arbitrary then 

A A > 1/|A*| (A.10) 

(consider the cases |A*| ^ 1 and |A*| > 1 separately). It follows from this, (|A.6|) and a 
crude bound that 

|£| 

n = 4^ < A A (Vd + Jlog A A ) < cL4 A . 
The alternative lemma follows immediately. □ 
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We will shortly combine the alternative lemma with some additional arguments which 
allow us to make progress when case (ii) holds. We first isolate a simple but important 
lemma that will be needed. 

Lemma A. 7 (Descent). Suppose that A' C and ACl d are full-rank lattices, and 
that A' C A, where we are regarding M.^ 1 as a subset o/M d in the usual way. Suppose 
that a' G R^ 1 , that a G M. d and that a — a' G A. Then 



F A>a {N) > 



det(A) 



det 



Proof. By definition we have 

F A>a (N) = det(A)E_* <n<JV e^ |n2 °- m| 



and there is a similar expression for F A i j01 i. Now by translation invariance and positivity 
we have, for each fixed n, 

e -Tr\n 2 a-m\ 2 _ ^ e -Tr\n 2 a' -m\ 2 > e -ir\n 2 a' -m'\ 2 

m£A rraeA meA' 

The result follows upon taking expectations over n. □ 

Proposition A. 8 (Inductive lower bound on F A ^ a ). Suppose that a G R d and that 
A CK d is a full-rank lattice. Let N > d c 'A^ be an integer. Then either F\ >a (N) ^1/2 
or else there is an a' G a full-rank lattice A' C M.^ 1 with 

A A , < (Vd + y/log A A )A A , (A.ll) 



and an N' > d~ c A A c N such that 



F Ka (N)>d- c Al c F A ,. a ,(N'). 

Proof. We begin by applying the alternative lemma. We may clearly assume that we 
are in case (ii), that is to say there exists a primitive £ G A* \ and a g < dA^ 
such that (|A.6J) and (|A.7J) are satisfied. By subjecting a and A to a rotation, we may 
assume without loss of generality that £ = is a multiple of the basis vector e<j. Now 
multiplying through by g we see from (1A.7J) that 

||£ • g 2 a|| K /z < c^iV" 2 . 



Recalling (jA.lOj) . we can find (3 G M d such that £ • (3 G Z and 

l/?-g 2 «l < \U~ l U-q 2 a\\^<^dAlN- 2 . 



In particular we may choose 
such that 

|/3 - g 2 e*| ^ N 2 Jd. (A.12) 

Now £ is primitive, and so there is m G A so that £ • /3 = £ • m. Since £ = this 
means that we may write (3 = (3' + m where (3' G M. d ~ l . 
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Now by Lemma IA.5I (i) we have 

F A>a (N)^d- c Al c F A>a (N*). 

Note that our lower bound on N ensures that the value of c in that lemma can be taken 
to be at least 10/N, as required. By Lemma lA.51 (ii) and the fact that q <C dA A this 
implies that 

F A , a (N) > d- c Al c F Kq 2 a (Njq). 
Lemma IA.5I (iii) and (|A.12|) allow us to assert that 

F A>a (N) > d~ c A A c F(i+i/d)A,(i+i/d)p(N*/q). 
From Lemma fA. 71 we obtain 

FUN) » d -o A; c_^_ FA , AN% (A , 3 ) 

where a' := (1 + l/d)(3' : A' := (1 + l/d)A H M ^ 1 and N' := N*/q. The claimed bound 
on N' follows immediately from the lower bound on Af* and the upper bound q -C dA A . 

Now since £ is primitive and parallel to we have det(A*) = |£| det((AfllR d_1 )*). Since 
det(n) det(n*) = 1 for any lattice II, the ratio of determinants in (jA.13|) is |£| -1 .In 
view of the upper bound (jA.6J) . this may be absorbed into the d~ c A A c factor, and we 
therefore obtain the claimed lower bound on F AjCe (N). 

It remains to place an upper bound on A A >. Note first that by positivity we have 

^AnR d -! < ^A 



det(AnM d - 1 ) det(A)' 
and so from the previous discussion and ()A.6j) we have 

Avn^-i ^ <^(Vd+ y/log A A )A A . (AAA) 

Secondly for any lattice II and any 5 > we clearly have 

me(i+<5)n men 

and so 

A A > ^ (1 + l/d) d A AnR d-i < A Am d-i. 
Combining this with (|A.14J) . we obtain the required upper bound on A A i. □ 

Iterating this proposition leads in a straightforward manner to the claimed lower bound 
on F AyC( (N). 

Proposition A. 9 (Lower bound for F Aa ). Let a G M d , suppose that A C M. d is a lattice 
of full rank with det(A) ^ 1, and let N > be an integer. Then we have the lower 
bound 

F A , a (N) » d~ Cd A- A Cd . 

Proof. Ii N < d Cod A A ° d then the result is immediate from the trivial lower bound 

F A JN) >det(A)/(2JV + l). 
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Suppose then that iV ^ d Cod A A ° d for some suitably large Co- Set a := a, A := A 
and N := N. Apply Proposition IA.8I repeatedly, obtaining vectors otj G M d ~i , lattices 
Aj C R^- 7 and integers Nj for j = 0, 1, ... . We will show in a short while that 
Nj > d c A A throughout this iteration, and so it is indeed valid to continue applying 
Proposition IA.8I If, at some point, we pass through case (i) of the alternative lemma 
(which leads to the lower bound FA, a (N) ^ 1/2) then we stop the iteration. The worst 
bounds result from when this is not the case, and the iteration proceeds all the way to 
d = 0. Note that we have F\ tCt (N) = 1 when d = 0. The growth of A\. during the 
iteration is controlled by (jA.llJ) . Noting that ^ det(A) ^ 1, we may employ the 
crude inequality 

Vd + VlogX < dX 1/d 
for X ^ 1. Using this it is easy to see from (jA.ll|) that 

A Aj « A c Ao 
for the duration of the iteration. Since 



N j+1 > d- c A- A ?K 



for all j, this confirms that Nj > d c A A throughout provided that Cq is chosen large 
enough. Since 

Fa^Nj) » d r ,l A r /- A . (A:.,), 
it also provides the desired lower bound on F\ )a (N). □ 

It remains to deduce Proposition IA.21 This is achieved by a truncation argument. 

Proof of Proposition^^ Let R be a quantity to be chosen later. We will need R > Cod 
for some large absolute constant Co. Apply Proposition IA.9I with a := («i, . . . , aid) and 
A := RZ d . We have 

A A = R d ( e-™ 2 ) d < (CR) d , 

and so (since R^ Cd) that proposition implies that 

F A , a (A0»iT Cd2 . 
Since det(A) = R d , it follows from the definition of F Aj(X that 

-Cd 2 



The contribution of the n = term is <C (CR) d /N, which is neglig ible if JV ^ CR Cd? 
for suitably large C . In this case we conlcude that there is n G {1, . . . , N} such that 

J2 e -n\n*a-m\* > R -C<? 
m£RZ d 



Fix this n. If we had \n 2 a — m| > y/~R for all m G -RZ d then we would have 



-ir\n 2 a-m\ 2 ^ -ttR 2 /2 -vr|n 2 a-m| 2 /2 
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for all m £ RL d . Summing in m and using (jA.3|) and (jA.4|) . we conclude that 

-7r|n 2 a -m| 2 < -ttR 2 /2 2 d/2 2 ir\tf ( f . 2 X < -^ 2 /2orf/2 

^ 6 det(A)^ 6 ^ nc ^ e d et(A)' 

m£RZ d V 7 ?6A* v ' 

which is < e nR2 / 2 (CR) d . Recall that i? ^ C d; if C is chosen large enough then this 
will contradict (jA.15|) . We are thus forced to conclude that there is some m G RL d such 
that \n 2 a — m\ ^ VR, and this clearly implies that ||naj||]R/z ^ 1/VR for j = 1, . . . , d. 

We have shown that if A^ ^ CR Cd2 and R ^ then there is some n, 1 ^ n ^ A/", such 
that ||n 2 aj||R/z ^ 1/VR for j = 1, . . . , d. \i N ^ C'd c ' d2 for some suitably large C 
then the proposition follows by choosing R = d~ 1 N c ^ d for some small absolute constant 
c > 0; if instead A^ < C'd c ' d2 then the proposition is trivial. □ 
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